Region Common Uses

Although the type system associated with regions is complicated, there are some simple common idioms. If you understand these idioms, you should be able to easily write programs using regions, and port many legacy C programs to Cyclone. The next subsection will explain the usage patterns of unique and reference-counted pointers, since they are substantially more restrictive than other pointers.

Remember that every pointer points into a region, and although the pointer can be updated, it must always point into that same region (or a region known to outlive that region). The region that the pointer points to is indicated in its type, but omitted regions are filled in by the compiler according to context.

When regions are omitted from pointer types in function bodies, the compiler tries to infer the region. However, it can sometimes be too “eager” and end up rejecting code. For example, in

void f1(int * x) {
  int * y = new 42;
  y = &x;
}

the compiler uses y’s initializer to decide that y’s type is int *`H. Hence the assignment is illegal, the parameter’s region (called `f1) does not outlive the heap. On the other hand, this function type-checks:

void f2(int x) {
  int * y = &x;
  y = new 42;
}

because y’s type is inferred to be int * `f2 and the assignment makes y point into a region that outlives `f2. We can fix our first function by being more explicit:

void f1(int * x) {
  int *`f1 y = new 42;
  y = &x;
}

Function bodies are the only places where the compiler tries to infer the region by how a pointer is used. In function prototypes, type declarations, and top-level global declarations, the rules for the meaning of omitted region annotations are fixed. This is necessary for separate compilation: we often have no information other than the prototype or declaration.

In the absence of region annotations, function-parameter pointers are assumed to point into any possible region. Hence, given

void f(int * x, int * y);

we could call f with two stack pointers, a lexical-region pointer and a heap-pointer, etc. Hence this type is the “most useful” type from the caller’s perspective. But the callee’s body (f) may not type-check with this type. For example, x cannot be assigned a heap pointer because we do not know that x points into the heap. If this is necessary, we must give x the type int *`H. Other times, we may not care what region x and y are in so long as they are the same region. Again, our prototype for f does not indicate this, but we could rewrite it as

void f(int *`r x, int *`r y);

Finally, we may need to refer to the region for x or y in the function body. If we omit the names (relying on the compiler to make up names), then we obviously won’t be able to do so.

Formally, omitted regions in function parameters are filled in by fresh region names and the function is “region polymorphic” over these names (as well as all explicit regions).

In the absence of region annotations, function-return pointers are assumed to point into the heap. Hence the following function will not type-check:

int * f(int * x) { return x; }

Both of these functions will type-check:

int * f(int *`H x) { return x; }
int *`r f(int *`r x) {return x; }

The second one is more useful because it can be called with any region.

In type declarations (including typedef) and top-level variables, omitted region annotations are assumed to point into the heap. In the future, the meaning of typedef may depend on where the typedef is used. In the meantime, the following code will type-check because it is equivalent to the first function in the previous example:

typedef int * foo_t;
foo_t f(foo_t x) { return x; }

If you want to write a function that creates new objects in a region determined by the caller, your function should take a region handle as one of its arguments. The type of a handle is region_t<`r>, where `r is the region information associated with pointers into the region. For example, this function allocates a pair of integers into the region whose handle is r:

  $(int,int)*`r f(region_t<`r> r, int x, int y) { 
     return rnew(r) $(x,y);
  }

Notice that we used the same `r for the handle and the return type. We could have also passed the object back through a pointer parameter like this:

  void f2(region_t<`r> r,int x,int y,$(int,int)*`r *`s p){ 
    *p = rnew(r) $(7,9); 
  }

Notice that we have been careful to indicate that the region where *p lives (corresponding to `s) may be different from the region for which r is the handle (corresponding to `r). Here’s how to use f2:

  { region rgn;
    $(int,int) *`rgn x = NULL; 
    f2(rgn,3,4,&x);
  }

The `s and `rgn in our example are unnecessary because they would be inferred.

typedef, struct, and datatype declarations can all be parameterized by regions, just as they can be parameterized by types. For example, here is part of the list library.

  struct List<`a,`r>{`a hd; struct List<`a,`r> *`r tl;};
  typedef struct List<`a,`r> *`r list_t<`a,`r>;

  // return a fresh copy of the list in r2
  list_t<`a,`r2> rcopy(region_t<`r2> r2, list_t<`a> x) {
    list_t result, prev;

    if (x == NULL) return NULL;
    result = rnew(r2) List{.hd=x->hd,.tl=NULL};
    prev = result;
    for (x=x->tl; x != NULL; x=x->tl) {
      prev->tl = rnew(r2) List(x->hd,NULL);
      prev = prev->tl;
    }
    return result;
  }  
  list_t<`a> copy(list_t<`a> x) {
    return rcopy(heap_region, x);
  }

  // Return the length of a list. 
  int length(list_t x) {
    int i = 0;
    while (x != NULL) {
      ++i;
      x = x->tl;
    }
    return i;
  }

The type list_t<type,rgn> describes pointers to lists whose elements have type type and whose “spines” are in rgn.

The functions are interesting for what they don’t say. Specifically, when types and regions are omitted from a type instantiation, the compiler uses rules similar to those used for omitted regions on pointer types. More explicit versions of the functions would look like this:

  list_t<`a,`r2> rcopy(region_t<`r2> r2, list_t<`a,`r1> x) {
    list_t<`a,`r2> result, prev;
    ...
  }
  list_t<`a,`H> copy(list_t<`a,`r> x) { ... }
  int length(list_t<`a,`r> x) { ... }